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In this study, two structural models (denoted as M pi0 ST and M pi0 SH) of the main proteinase (M p10 ) from 
the novel coronavirus associated with severe acute respiratory syndrome (SARS-CoV) were constructed 
based on the crystallographic structures of M pi ° from transmissible gastroenteritis coronavirus (TGEV) 
(M pi0 T) and human coronavirus HcoV-229E (M pi0 H), respectively. Various 200 ps molecular dynamics simu¬ 
lations were subsequently performed to investigate the dynamics behaviors of several structural features. 
Both M pi0 ST and M pi0 SH exhibit similar folds as their respective template proteins. These structural models 
reveal three distinct functional domains as well as an intervening loop connecting domains II and III as found 
in both template proteins. In addition, domain III of these structures exhibits the least secondary structural 
conservation. A catalytic cleft containing the substrate binding subsites SI and the S2 between domains I and 
II are also observed in these structural models. Although these structures share many common features, the 
most significant difference occurs at the S2 subsite, where the amino acid residues lining up this subsite are 
least conserved. It may be a critical challenge for designing anti-SARS drugs by simply screening the known 
database of proteinase inhibitors. 

Keywords: Main proteinase; Coronavirus; Severe acute respiratory syndrome (SARS); Molecular dy¬ 
namics simulations; Functional domain; Structural model; Inhibitor. 


INTRODUCTION 

An outbreak of atypical pneumonia, designated as se¬ 
vere acute respiratory syndrome (SARS), was first reported 
in Guangdong Province of China in late 2002, and rapidly 
spread to several countries. ’ Infection by SARS is usually 
characterized by high fever, malaise, rigor, headache, non¬ 
productive cough and may progress to generalized, intersti- 
tial infiltrates in the lung. Attempts to identify the etiology 
of the SARS outbreak were not successful until March 2003, 
when laboratories in the United States, Canada, Germany, 
and Hong Kong isolated a novel coronavirus (SARS-CoV) 
from SARS patients. The sequence of the complete genome 
of SARS-CoV was further determined and characterized with 
two different isolates. 4,5 Phylogenetic analyses and sequence 
comparisons reveal that SARS-CoV is not closely related to 
any of the three groups of coronaviruses, including two hu¬ 
man coronaviruses, HcoV-229E (group I) and HcoV-OC43 
(group II), which are responsible for about 30% of mild upper 
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respiratory tract illnesses, " in particular, the common cold. 

Coronaviruses belong to a diverse group of positive- 
stranded RNA viruses featuring the largest viral RNA genomes 
known to date (27-31 kb). They share a similar genome orga¬ 
nization and common transcriptional and translational pro¬ 
cesses as Arteriviridae. 10,11 The human coronavirus HcoV- 
229E replicase gene encodes two overlapping polyproteins, 
ppla (replicase la, -450 kDa) and pplab (replicase lab, 
-750 kDa), that mediate all the functions required for viral 
replication and transcription. 13 The functional polypeptides 
are released from the polyproteins by extensive proteolytic 
processing, which is primarily achieved by the 33.1-kDa 
HCoV-229E main proteinase (M p10 ). 14 M p10 is commonly also 
called 3C-like proteinase (3CL p10 ) to indicate a similarity of 
its cleavage site specificity to that observed for picornavirus 
3C proteinase (3C p10 ) and the identification of a Cys residue 
as the principle nucleophile in the context of a predicted 
two-P-barrel fold. 1516 M pro from HcoV-229E (M pro H) has 
been biosynthesized in Escherichia coli and the enzyme 
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properties, inhibitor profile, and substrate specificity of the 
purified protein have been well characterized. 14,17 

Several studies have revealed significant differences in 
both the active sites and domain structures of M pi ° from coro- 
navirus and picornavirus. " It is noteworthy that corona- 
virus M p10 possesses a large C-terminal domain of -110 
amino acid residues (domain III) that is not found in other 
RNA virus 3CL p10 . Deletion of this domain results in dra¬ 
matic losses of proteolytic activity, suggesting that the 
C-terminal domain III of M p10 contributes to proteolytic ac¬ 
tivity through undefined mechanisms. Previous experimental 
data have shown that the differential cleavage kinetics of 
sites within ppla/pplab are a conserved feature of corona- 

virus M p10 and that similar processing kinetics for the repli- 

22 

case polyproteins of all coronaviruses can be predicted. 
Furthermore, the cleavage pattern appears to be conserved in 
M p10 from SARS-CoV (M pi0 S) and from other coronavir- 
uses, as deduced from the genome sequence. ’ The func¬ 
tional importance of M p10 in the viral life cycle has made this 
proteinase an attractive target for the development of drugs 
directed against SARS and other coronavirus infections. 
Moreover, molecular modeling has suggested that available 
rhinovirus 3C p10 inhibitors such as compound AG7088 may 
be modified to be tested for SARS therapy. Therefore, 
screening the known proteinase inhibitor libraries may be an 
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appreciated shortcut to discover anti-SARS drugs. 

Recently, crystal structures of M pro H 23 and M p10 from 
porcine coronavirus (transmissible gastroenteritis virus, 
TGEV) (M pro T) complexed with its inhibitor 26 have been de¬ 
termined. In addition, homology models of M pi0 S based on 
the crystal structures of M pro H 23 and M pi0 T 25 have been also 
reported. Comparison of these structures reveals a remark¬ 
able degree of conservation of the substrate binding sites, 
which is further supported by the cleavage of the substrate for 
the M pi0 T with the recombinant M pi0 S. 23 In addition, M pi0 S 
shows 40 and 44% sequence identity to M pi0 H and M pi0 T, re- 
spectively. Although the results from the deduced genome 
sequence of SARS-CoV have indicated that it belongs to a 
new group of coronaviruses, 24 the significantly high se¬ 
quence identity of M pi0 S to bovine coronavirus (BCoV) M p10 
(49%) and mouse hepatitis virus (MHV) M p10 (50%) from 
group II coronaviruses has allowed Anand et al. to recog¬ 
nize it as an outlier among group II coronaviruses. 

Molecular dynamics (MD) simulations in the atomic 
level have been intensively applied to gain insight into the 
structure-function relationships of proteins. Previously, sev¬ 
eral MD simulations and molecular docking experiments 
have been successfully conducted towards various target pro- 
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teins in our group. " In this paper, two structural models of 
M pi0 S (denoted as M pi0 SH and M pi0 ST) were constructed 
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based on the crystallographic structures of M pi0 H and 
M pi0 T , respectively, by the comparative approach. In addi¬ 
tion, MD simulations were conducted to investigate the dy¬ 
namics behaviors of these structures. Beyond the continued 
characterization of M pi ° from various coronaviruses, the 
amino acid sequence alignment and structural homology 
analyses of M pi0 S presented in this study provide particularly 
attractive targets for further structure-based studies, such as 
folding/unfolding mechanism and molecular docking, which 
are currently being carried out in our group. 

METHODS 
Model proteins 

Structural homology to construct the structural models 
of M pi0 S (M pi0 ST and M pi0 SH) was based on the monomer of 
the three-dimensional (3D) structure of M pi0 T, refined to 1.96 
A resolution 26 (Fig. 1(A)), and that of M pi0 H, solved at 2.54 A 
resolution" (Fig. 1(B)), obtained from the protein data bank 
(PDB; accession numbers llvo and lp9u, respectively). The 
inhibitor, a substrate analog hexapeptidyl chloromethyl ke¬ 
tone, was removed from the crystallographic structure of 
M pi0 T before being used as a template. Unfavorable nonphys¬ 
ical contacts in these structures were then eliminated using 
the Biopolymer module of the Insight II program (Accelyrs, 
San Diego, CA, USA) with the force field Discover CVFF 
(consistent valence force field) 34 ’ 36 in the SGI 0200 worksta¬ 
tion with a 64-bit HIPS RISC R12000 2 x 270 MHz CPU and 
PMC-Sierra RM7000A 350 MHz processor (Silicon Graph¬ 
ics, Inc., Mountain View, CA, USA), followed by 10,000 en¬ 
ergy minimization calculations using steepest descent method, 
to yield the model proteins for further structure building. 

Structural homology 

Homology utilizes structure and sequence similarities 
for predicting unknown protein structures. The Homology 
module in Insight II allows us to build the 3D models of the 
target protein (i.e., M pi0 S) using both its amino acid sequence 
and the structures of known, related model proteins (i.e., 
M pi0 H and M pi0 T). The Homology program provides simulta¬ 
neous optimization of both structure and sequence homo¬ 
logies for multiple proteins in a 3D graphics environment, 
based on a method developed by Greer. Smith-Waterman 
pairwise amino acid sequence alignments were performed 
based on the conserved active site and substrate binding 
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subsites among M p10 from various coronaviruses to find the 
location of the active site and substrate binding subsite of 
M pi0 S. The consensus structural conserved regions (SCRs) of 


the target protein were generated from alignments of the tar¬ 
get protein to the model proteins. The atomic coordinates 
were then transferred from the model proteins to the target 




Fig. 1. The x-ray crystallographic structure of (A) M pro T and (B) M pro H and the structural model of (C) M pro ST and (D) 
M pro SH. These structures are visualized by the Insight II program. The N- and C-termini are indicated. a-Helices are 
shown in red cylinders, while (3-strands are illustrated in arrows pointing from N- to C-terminus. The polypeptide 
backbones belonging to the turn and random coil regions are shown in blue and green, respectively. The general acid- 
base catalyst His residue and the nucleophilic Cys residue are labeled. The locations of the putative substrate binding 
subsites SI and S2 are indicated. 
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protein in each SCR using the Mutation Matrix module of the 
Insight II program. Automatic loop building was performed 
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either by database searching or generation through random 
conformational search. The coordinates at the N- and C- 
termini of these loops were then automatically assigned. Side 
chains of the target protein were automatically replaced, pre¬ 
serving the conformations of the model proteins. The side 
chain conformations were optimized either manually or auto¬ 
matically using a rotamer library. 24 Similar secondary struc¬ 
ture motifs were identified by database searching and pre¬ 
dicted by DSSP. 40 The bond lengths and torsion angles in the 
SCRs and loop regions were repaired and relaxed using 
Homology/Refine/SpliceRepair and Homology/Refine/Re¬ 
lax, respectively. The newly built structures of the target pro¬ 
tein were substantially refined to avoid van der Waals radius 
overlapping, unfavorable atomic distances, and undesirable 
torsion angles using molecular mechanics and dynamics fea¬ 
tures in the Discover module. 

Molecular dynamics simulations 

The crystallographic structures of M pi0 H and M pi0 T and 
the structural models of M pi0 SH and M pi0 ST were subjected to 
energy minimization calculations by steepest descent method 
with 3,000 iterations followed by Newton-Raphson method 
with 5,000 iterations to be used as the initial energy-mini¬ 
mized structures for further structural comparison. Each en¬ 
ergy-minimized structure was subsequently placed in the 
center of a lattice with the size of 50 x 60 x 85 A' full of 
6,222, 5,866, 5,836, and 5,776 water molecules for the sys¬ 
tem of M pro H, M pro T, M pro SH, and M pro ST, respectively. 
These systems composed of the target protein and water mol¬ 
ecules were then equilibrated by performing 20,000 steepest 
descent minimization and 10 ps dynamics calculations. The 
explicit image periodic boundary condition (PBC) was used 
for solvent equilibrium. At the end of explicit image equilib¬ 
rium, Discover will re-image a molecule whose center of 
mass has moved out of the lattice in order to maintain the in¬ 
tegrity of the lattice with a relatively constant density. A cut- 
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off of 14 A was used to calculate long-range electrostatic in¬ 
teractions. Finally, 200 ps MD simulation was carried out for 
each system using the Discover module of Insight II. The 
temperature and pressure were maintained for each MD sim¬ 
ulation by weak coupling the system to a heat bath at 300 K 
and an external pressure bath at one atmosphere with a cou¬ 
pling constant of 0.5 ps, according to the method described 
by Berendsen et al. A cut-off radius of 10 A for the non- 
bonded interactions was applied to each MD simulation. The 
time-step of the MD simulations was 1 fs. The trajectories 


and coordinates of these structures were recorded every 2 ps 
for further analysis. 

Structural analyses 

Although some complicated algorithms have been pro¬ 
posed to measure the structural similarity between pro¬ 
teins, 42,43 the root-mean-square deviation (RMSD) remains 
the simplest one for closely related proteins. 44 For each MD 
simulation, the RMSDs of the trajectories recorded every 2 ps 
interval were calculated for the backbone C a atom of the en¬ 
tire protein, domains I, II, and III, and the substrate binding 
subsites SI and S2 during the course of 200 ps MD simula¬ 
tions with reference to the respective starting structure ac¬ 
cording to Koehi. 45 The RMSDs were calculated after opti¬ 
mal superimposition of the coordinates to remove transla¬ 
tional and rotational motion. 46 Secondary structures were 
predicted based on DSSP, 40 in which pattern recognition of 
the hydrogen bond was correlated to the geometrical features. 
The default hydrogen bonding energy criterion of -0.5 kcal/ 
mol was used. Accessible surface areas (ASAs) of the sub¬ 
strate binding subsites SI and S2 and the linear distance be¬ 
tween the sulfur atom of the nucleophilic Cys residue and the 
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N £ of the general acid-base catalyst His residue for each 
structure were also recorded as a function of MD simulation 
time. 

RESULTS AND DISCUSSION 

Amino acid sequence alignment 

Point, insertion, or deletion mutations that would result 
in a critical loss of biological functions are less favored by 
evolution and consequently, functionally and structurally rel¬ 
evant domains tend to be highly conserved across a corre¬ 
sponding protein family. Such conservation can be detected 
as a pattern of conserved residues that would be unlikely to 
have occurred by chance. Therefore, an optimal amino acid 
sequence alignment based on the conserved residues is essen¬ 
tial to the success of structural homology. The results of 
amino acid sequence alignment of M pro S to M pro T and M pi0 H 
are given in Fig. 2. There are 301, 300, and 306 residues in 
M pi0 T, M pi0 H, and M pi0 S, respectively. The residue corre¬ 
sponding to Ala46 in domain I of M pi0 S and those corre¬ 
sponding to Asp248, Ile249, and Gln273 in domain III of 
M pi0 S are missing in both M pi0 T and M pi0 H. In addition, there 
are one and two extra residues at the C-terminus of M pi0 S 
compared to M pi0 T and M pi0 H, respectively. There are 33,33, 
and 29 totally conserved residues in domains I, II, and III 
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among these M p10 , respectively, indicating that domain III ex¬ 
hibits higher sequence variation among these three domains. 
It has been predicted earlier that the active site of the corona- 
virus M p10 is similar to those of picornavirus 3C p10 . 16 Both the 
general acid-base catalyst and the nucleophile residues of 
these three proteins are totally conserved, where the general 
acid-base catalyst His41 is located in a highly conserved sig¬ 
nature sequence (LNGLWLXDXVXCPRHVI) of domain I 
and the nucleophilic Cysl44 for M pi0 T and M pi0 H or Cysl45 
for M pi0 S is located in the highly conserved signature se¬ 
quence (TIXGSFXXGXCGSXG) of domain II (i.e., Xs indi¬ 
cate the nonconserved residues). 

Table 1 lists the percentages of amino acid identity 
among these proteins. M pi0 T and M pi0 H show the highest total 
amino acid identity (60.80%), whereas M pi0 H and M pi0 S ex¬ 
hibit the lowest total amino acid identity (40.19%). In addi¬ 
tion, domain II has the highest amino acid identity, whereas 
domain III shows the lowest amino acid identity among these 
three proteins. M pi0 S shows slightly higher amino acid iden¬ 
tity to M pi0 T than M pi0 H, indicating that the structure of M pi0 S 
may be more similar to M pi0 T than M pi0 H. The above results 
are in good agreement with the previous finding that M pi0 S 
shows 40 and 44% sequence identity to M pi0 H and M pi0 T, re- 
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spectively. Although the significantly high sequence iden¬ 
tity of M pro S to BCoV M pro (49%) and MHV M pro (50%) from 
group II coronaviruses has allowed Anand et al. to recog¬ 
nize it as an outlier among group II coronaviruses, the low se¬ 
quence identities between M pro S and M pro T and between 
M pro S and M pro H from the present study strongly support the 
results of Marra et al., 4 in which SARS-CoV was classified as 
a new group of coronavirus from the analysis of the deduced 


Table 1. The amino acid sequence identities among M pi0 H, 
M pro T, and M pro S 


Identity (%) 


Total 

Domain I Domain II 

Domain III 

M pro H and M pro T 

60.80 

63.44 65.06 

55.45 

M pro H and M pro S 

40.19 

41.94 45.78 

35.64 

M pro T and M pro S 

43.85 

44.09 49.40 

39.22 


genome sequence. 

The Structural Models of M pro ST and M pro SH 

Recently, two 3D models of M pi0 S have been con- 
structed based on the crystallographic structure of M pi0 H 
and that of M pi0 T, using the homology modeling technique. 
Virtual screening was further performed employing molecu¬ 
lar docking towards both constructed models to identify pos¬ 
sible 3CL pro inhibitors from small molecular databases for 
SARS therapy. The level of similarity between M pro S and 
M pro T as well as M pro H allowed us to construct two structural 
models for M pro S (denoted as M pi0 ST and M pi0 SH) by the 
comparative approach and the results are illustrated in Figs. 
1(C) and 1(D). There are three 1- and 2-residue insertions in 
M pro S, relative to both structural templates (Fig. 2); as to be 
expected, these are all located in loops and do not present a 
problem in model building. Both M pro ST and M pro SH exhibit 
three distinct domains, indicating that they adopt similar 
folds as M pi0 T and M pi0 H, respectively. However, the second¬ 
ary structures of both M pi0 ST and M pi0 SH predicted accord¬ 
ing to DSSP 40 were less conserved compared to those of 
M pro T (Fig. 1(A)) and M pro H (Fig. 1(B)), particularly in do- 


M W T l') 
M pro H (') 
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( 102 ) 
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Fig. 2. Amino acid sequence alignment of M pro T, M pro H, and M pro S. Secondary structures as defined in the crystallographic 
structure of M pro T are shown on top. The start and end amino acid residues are numbered in the brackets on the left and 
right of each sequence, respectively. Residues totally conserved in all sequences are indicated in red letters with green 
background. Residues conserved in M pro T and M pro H but different from those in M pro S are represented in black letters 
with yellow background. Residues where variations occur are given in blue or brown letters with grey background. 
The amino acid residues missing in both M pro T and M pro H are shown as dashed lines. 
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main III. The result is consistent with that of amino acid se¬ 
quence alignment, showing that domain III exhibits the least 
sequence identity compared to domains I and II among these 
proteins. It further implies that some of the main-chain or 
side-chain hydrogen bonds in the constructed homology 
models may be destroyed in order to maintain folds similar to 
the model proteins. 

The putative substrate binding subsites SI and S2 of 
M pi0 ST and M pi0 SH are located in a cleft between domains I 
and II, which are nearly identical to those of M pi0 T and M pi0 H 
(Fig. 1). It indicates that M pi0 S may follow the similar sub¬ 
strate binding mechanisms of M pi0 T and M pi0 H, allowing us 
to design anti-SARS drugs by screening the known pro¬ 
teinase inhibitors. Instead of separating domains I and II with 
the catalytic cleft, a long loop (residues 184 to 199 in both 
M pro T and M pro H and residues 185 to 200 in M pro S) loosely 
connect domains II and III in all structures. Domain III, a 
globular cluster of 5, 5, 4, and 2 helices for M pi0 T, M pi0 H, 
M pi0 ST, and M pi0 SH, respectively (Fig. 1), has been impli- 
cated in the proteolytic activity of M p10 . Indeed, there 
should be only 3 and 1 helices in domain III of M pi0 ST and 
M pi0 SH, respectively, both with helix AIII broken into two 
parts. Comparing the two crystallographic structures, M pi0 T 
and M pi0 H, and the two homology models, M pi0 ST and 
M pi0 SH, we found that domain I of M pi0 S is more similar to 
that of M pi0 H, while domains II and III of M pi0 S are more sim¬ 
ilar to those of M pro T. The low sequence identity and second¬ 
ary structure similarity in domain III among these proteins 
presented in the present study, as well as the previous find¬ 
ings showing that the characterization of recombinant pro¬ 
teins, in which 33, 28, and 34 C-terminal amino acid residues 
of M pro from IBV, MHV, and HCoV, respectively, were de¬ 
leted resulted in dramatic losses of proteolytic activity, sug¬ 
gest that domain III may play a minor role in proteolytic ac¬ 
tivity through an undefined mechanism. 

The analysis of RMSD (Table 2) shows that the struc¬ 
ture of M pi0 H is very similar to that of M pi0 T, 26 with the 
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RMSD between these two structures being 2.01 A for all 300 
C a positions of the molecule. M pi0 H used in this work lacks 
two amino acid residues from the C-terminus. Nevertheless, 
it has the same enzymatic properties as full-length M pi0 H but 
yields much superior crystals. In the structure of full-length 
M pi0 H, residues 301 and 302 are disordered and not seen in 
the electron density. 23 With both HCoV 229E and TGEV be¬ 
ing group I coronavirus, 47 their M p10 share 60.8% sequence 
identity (Table 1). As shown in Table 2, the RMSDs of the 
constructed models, M pi0 SH and M pi0 ST, are 4.84 and 3.94 A, 
compared to their corresponding templates, M pi0 H and M pi0 T, 


Table 2. The RMSDs between the template proteins, M pi0 H and 
M pro T, and the structural models, M pro SH and M pro ST 


RMSD (A) 


M pro H 

M pro T 

M pro SH 

M pro T 

2.01 

- 

- 

M pro SH 

4.51 

3.94 

- 

M pro ST 

4.84 

4.37 

5.78 


respectively, while the RMSD between M pi0 SH and M pi0 ST is 
5.78 A. It indicates that the structure of M pi0 S is more similar 
to that of M pi0 T than that of M pi0 H. It further implies that the 
choice of a more closely related template protein yields a 
more accurate structural model of M pi0 S. 

Molecular dynamics simulations 

The six monomers of M pro T 26 and M pro H 23 presented the 
asymmetric unit are arranged in three dimmers. Each mono¬ 
mer is folded into three distinct domains, the first two of 
which are antiparallel (3-barrels reminiscent of those found in 
serine proteinases of the chymotrypsin family. Residues 
8-100 form domain I, and residues 101-183 make up domain 
II (Fig. 2). The connection to domain III is formed by a long 
loop comprising residues 184-199. Domain III, composed of 
residues 200-302 (Fig. 2), contains a novel arrangement of 5 
a-helices (Fig. 1(A) and (B)). A deep cleft between domains I 
and II, lined up by hydrophobic residues, constitutes the sub¬ 
strate binding subsites. The catalytic site is situated at the 
center of the cleft. In order to investigate the dynamics behav¬ 
iors of M pro T, M pro H, M pro ST, and M pro SH, various MD simu¬ 
lations of these proteins in explicit water were conducted at 
300 K. The overall structural changes were evaluated by plot¬ 
ting the RMSDs of these proteins relative to the original posi¬ 
tions in the corresponding starting structures as a function of 
running time as shown in Fig. 3. During the MD time course, 
these structures remained considerably stable, with the 
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RMSDs remaining within 3 A. It is obvious that domain III 
exhibited more structural variations than the other two do¬ 
mains in all cases. The substrate binding subsite SI of M pi0 ST 
was found to maintain its structural integrity during the entire 
MD time course, whereas the substrate binding subsite S2 of 
M pi0 ST exhibited more structural variations. The higher 
structural variation of S2 makes it flexible enough to accom¬ 
modate a bulky hydrophobic residue from the substrate. 

The secondary structure propensity of these proteins 
was predicted according to DSSP 40 during the entire MD 
courses and the results are shown in Fig. 4. As expected, both 
M pi0 T and M pi0 H exhibited higher secondary structure stabil- 
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ity than M pi0 ST and M pi0 SH. The interior of the (3-barrel of 
domain I consists entirely of hydrophobic residues. A short 
a-helix (Helix AI; Tyr53-Ser58) closes the barrel like a 
lid. 23,26 However, this short a-helix in both M pi0 T and M pi0 H 
is unstable during the 200 ps MD simulations (Fig. 4(A) and 
(B)). Furthermore, this short a-helix is missing in M pi0 ST and 
M pi0 SH (Fig. 4(C) and (D)). Domain II is smaller than domain 
I and also smaller than the homologous domain II of shy- 
motrypsin and hepatitis A virus (HAV) 3C pro . 48-50 Several 
secondary structure elements of HAV 3C p10 are missing in 
both M pi0 T and M pi0 H. Domain III is composed of 5, mostly 
antiparallel, a-helices and the loops connecting them. Inter- 
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helical contacts are mediated by hydrophobic side chains. 
Database searches ’ did not reveal other proteins or protein 
domains with the same topology as domain III. The present 
homology models showed that some of the secondary struc¬ 
tures of M pi0 T and M pi0 H were missing in domain III. It is 
possible that in order to maintain folds similar to the template 
proteins, some of the main-chain and side-chain hydrogen 


bonding patterns of both M pi0 ST and M pi0 SH were missing, 
resulting in the loss of the secondary structure content. Our 
results again suggest that domain III of these M p10 play a role 
in proteolytic activity through an undefined mechanism re¬ 
gardless of its structural integrity. 

Active site 

The active site of the coronavirus M p10 is similar to 
those of the piconavirus 3C pi °, as had been predicted previ¬ 
ously. 16 The mutual arrangement of the nucleophilic Cysl44 
and the general acid-base catalyst His41 of M pi0 T is identical 
to that of the HAV 3C p10 Cysl72 and His44 and the Seri95 
and His57 residues of chymotrypsin. 26 The distance between 
the sulfur atom of Cysl44 and the N s2 of His41 in M pi0 T is 
4.05 A, longer than the corresponding Cys-His distances in 
HAV 3C pro (3.92 A), 49 poliovirus (PV) 3C pro (3.4 A), 52 and 
papain (3.65 A). From a dynamics point of view (Fig. 5), the 
Csyl44-His41 distance of M pi0 H fluctuated more rapidly 
than that of M pi0 T. In addition, the Cysl45-His41 distances of 



Total 


M pro SH 


Domain I 



Time (ps) 


Fig. 3. The RMSDs of the backbone C a for (A) the whole protein, (B) domain I, (C) domain II, (D) domain III, (E) substrate 
binding subsite SI, and (F) substrate binding subsite S2 of M pro T, M pro H, M pro ST, and M pro SH with reference to their 
respective starting structure during the 200 ps MD simulations at 300 K. 
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M pi0 SH fluctuated more rapidly than that of M pi0 ST beyond 
150 ps. These results indicate that both M pi0 T and M pi0 ST 
may exhibit more stable active site configurations than those 
of M pro S and M pro SH. 

It is generally assumed that the native state of the active 
site of papain-like cystein proteinases is a thiolate-imidazo- 
lium ion pair formed by Cys and His residues. 54 In protein¬ 
ases of the papain family, an Asn residue is the third member 
of the catalytic triad. Chymotrypsin and other members of 
this serine proteinase family have a catalytic triad consisting 
of Serl95-His57-Aspl02. In HAV 3C pro , Asp84 is present at 
the required position. 49,55 PV 3C p10 , human rhinovirus (HRV) 
3C p10 , and HRV 2A p10 have a Glu or Asp residue in the proper 
orientation to accept a hydrogen bond from the active site His 
residue. 52,56,57 Both M pi0 T and M pi0 H have Val84 in the corre¬ 
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sponding position, with its side chain pointing away from the 
catalytic site. The corresponding residue in M pi0 S is Cys85 
(Fig. 2). In both M pi0 T and M pi0 H, the polypeptide segment 
184-199, which connects domains II and III and is probably 
involved in substrate binding, is held in the proper position 
during catalysis. The corresponding segment was also found 
in M pi0 S, although its amino acid sequence is not conserved 
compared to those of M pi0 T and M pi0 H (Fig. 2). A direct in¬ 
volvement of His 163 or Asp 186 of M pi0 T, Gin 163 or Asp 186 
of M pi0 H, and His 164 or Asp 187 of M pi0 S in catalysis, makes 
them a clear case of viral cystein proteinase employing only a 

o/r 

catalytic dyad. 


Substrate-binding subsites 

It has been shown previously that, similarly to 
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Fig. 4. Secondary structures predicted according to DSSP 40 as a function of MD simulation time for (A) M pro T, (B) M pro H, 
(C) M pro ST, and (D) M pro SH. a-Helix, (3-sheet, turn, and coil are shown in red, light yellow, blue, and green, respec¬ 
tively. 


SePio, 49 ’ 52 ’ 56 specific substrate binding by M p10 is ensured by 
well-defined SI and S2 binding pockets. 26 In addition, it has 
also been shown that the imidazole side chain of a conserved 
His residue, which is located in the center of a hydrophobic 
pocket, interacts with the PI carboxamide side chain of the 
substrate. This specific interaction is generally considered to 
determine the piconavirus 3C p10 specificity for Gin residue at 
pi 49,52,56 rp^ e t 0 t a iiy conserved Hisl62 of both M pi0 T and 
M pi0 H or His 163 of M pi0 S is located at the very bottom of this 
hydrophobic pocket which is formed by the totally conserved 
residues Phel39 of both M pro T and M pro H or Phel40 of M pro S 
and the main-chain atoms of Ilel40, Leul64, Glul65, and 
His 171 of M pro T, He 140, lie 164, Glul65, His 171 of M pro H, or 
Leul41, Metl65, Glul66, and Hisl72 of M pi0 S. The totally 
conserved Glul65 of M pi0 T and M pro H or Glul66 of M pi0 S 
forms an ion pair with the totally conserved His 171 of M pi0 T 
and M pro H or His 172 of M pi0 S. 26 This salt bridge is itself on 
the periphery of these molecules, forming part of the outer 
wall of the SI subsite. 

Coronavirus M p10 has a strong preference for Leu resi- 

9 1 

due at the P2 position. Similar to S1 subsite, the putative S2 


subsute identified in the structure is also a hydrophobic 
pocket that is suitably positioned and large enough to accom¬ 
modate a Leu side chain easily. In both M pi0 T and M pi0 H, the 
S2 pocket is lined by the side chains of His41, Thr47, Ile51, 
Leul64, and Prol88, despite residue Leul64 in M pi0 T being 



Fig. 5. The linear distance between the sulfur atom of 
the nucleophilic Cys residue and the N s2 of the 
general acid-base catalyst His residue as a func¬ 
tion of MD simulation time for M pro T, M pro H, 
M pro ST, and M pro SH. 
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replaced by lie. In M pi0 S, the S2 pocket is lined by the side 
chains of His41, Asp48, Pro52, Metl65, and Glnl89. It indi¬ 
cates that the S2 pocket is not as conserved as the S1 pocket, 
probably because the S2 subsite is located at the open mouth 
of the catalytic cleft formed by domains I and II, while the S1 
subsite is situated in the very bottom of this cleft. Thus, the 
structural conservation of the substrate binding subsite S1 is 
necessary to maintain the structural integrity of both domains 
I and II. It is worth mentioning that the main chain of Leu 164 
of M pi0 T (or lie 164 of M pi0 H or Metl65 of M pi0 S) forms part 
of the SI subsite while its side chain is involved in the S2 
subsite. It indicates that these two subsites are somewhat in¬ 
fluenced by each other towards substrate binding. 

Fig. 6 shows the AS As of both SI and S2 subsites for 
M p ro T , M pro H, M pro ST, and M pro SH as a function of MD simu¬ 
lation time. Both subsites are flexible enough to accommo¬ 
date the substrates. In order to gain a clearer look of these 
structures, the snapshots of both SI and S2 subsites for these 
proteins with the smallest and largest AS As sampled from the 
200 ps MD simulations are illustrated in Fig. 7. The smallest 
ASA of SI is 212, 95.4, 150, and 233 A 2 sampled at 38, 116, 
146, and 2 ps, while the largest ASA of SI is 360, 158, 286, 
and 361 A 2 sampled at 88, 30, 28, and 94 ps for M pi0 T, M pi0 H, 
M pi0 ST, and M pi0 SH, respectively. The smallest ASA of S2 is 
117, 107, 290, and 143 A 2 sampled at 2, 118, 176, and 4 ps, 



Time (ps) 

Fig. 6. The ASAs of the substrate binding subsites (A) 
SI and (B) S2 as a function of MD simulation 
time for M pro T, M pro H, M pro ST, and M pro SH. 


while the largest ASA of S2 is 344, 217,461, and 296 A 2 sam¬ 
pled at 158, 196, 26, and 158 ps for M pro T, M pro H, M pro ST, 
and M pi0 SH, respectively. It is interesting that the sizes and 
conformations of the smallest and the largest SI pocket of 
M pi0 SH are very similar to those of M pi0 T. The variation of 
the size and conformation of S2 subsite for these proteins is 
more significant than the S1 subsite during the MD simula¬ 
tions, probably because S2 is fully exposed to the solvent and 
is easy to be subjected to structural change. The structural 
variation of these two subsites allows them to accommodate 
the specific recognition residues of the substrates upon bind¬ 
ing. 

In conclusion, the technique of the comparative ap¬ 
proach was successfully applied to construct the homology 
models of M pi0 ST and M pi0 SH based on the crystal structures 
of M pi0 T and M pi0 H, respectively, in this study. Molecular dy¬ 
namics simulations were subsequently conducted to investi¬ 
gate the dynamics behaviors of the structural elements of 
these structures. Although these structures share many com- 



Fig. 7. The snapshots of the substrate binding subsites 
SI and S2 for M pro T, M pro H, M pro ST, and 
M pro SH with the smallest and the largest ASAs 
during the 200 ps MD simulations. The protein 
residues are illustrated in CPK with the residues 
forming these subsites being shown in red. The 
residues lining up the hydrophobic pockets of 
subsites SI and S2 are labeled. The value of the 
smallest and the largest ASAs for each protein 
and the time point the structure was sampled are 
given at the upper-right and the lower-right cor¬ 
ners of each frame, respectively. 
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mon features, the most significant difference occurs at the S2 
subsite, where the amino acid residues lining up this subsite 
are least conserved. It may be a critical challenge for design¬ 
ing anti-SARS drugs by simply screening the known data¬ 
base of proteinase inhibitors. 
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